Skip to content

perf(version): wrap Ranges in Rc inside VersionRange#73

Merged
doubleailes merged 1 commit into
mainfrom
rc-version-range
May 15, 2026
Merged

perf(version): wrap Ranges in Rc inside VersionRange#73
doubleailes merged 1 commit into
mainfrom
rc-version-range

Conversation

@doubleailes
Copy link
Copy Markdown
Owner

Summary

Callgrind on the rez 188-case benchmark (post-#66/#67/#68/#70/#71/#72) put `SmallVec::extend` + `Drop` at ~4 % of cycles, almost entirely from `VersionRange::clone` deep-copying the inner `Ranges`'s `SmallVec` of `(Bound, Bound)` segments. `VersionRange` clones happen everywhere on the hot path — `Requirement::clone` (extract loop, per-pair `reduce_by`, `Reduction` construction, ...) — so they accumulate.

Switch the inner from `Ranges` to `Rc<Ranges>`:

  • `Rc::clone` is a refcount bump (no heap alloc).
  • `Rc::Hash` / `PartialEq` defer to the inner `T` — derived semantics on `VersionRange` are unchanged.
  • Methods that build a new range (`intersection`, `union`, `complement`, `from_versions`, `span`, `split`, ...) still produce a fresh `Ranges` internally and wrap it with `Rc::new`. The win is on the read/clone path, not the construction path.
  • `as_ranges()` continues to return `&Ranges` (via `Rc::deref`). `into_ranges()` now uses `Rc::unwrap_or_clone` — falls back to a clone only if the `Rc` is shared. This is the consume-the-`VersionRange` API; rare in practice and unused by the solver.

Single-file change in `rer-version`.

Benchmark (188 cases, release, same machine, two runs)

Stage Total Mean vs rez
Baseline (post-#72), median ~12.7 s ~68 ms ~30×
+ this change, run 1 11.2 s 60 ms 34.1×
+ this change, run 2 11.3 s 60 ms 33.7×

~11 % additive on top of #72. Predicted 3–5 % — came in higher because `VersionRange::clone` cascades into more than just the `SmallVec::extend` it shows up as in the exclusive view; it also drives allocator-side work and the matching `Drop`s.

Cumulative from main: 43.0 s → 11.2 s, 8.8× rez → 34.1× rez (−74 %). Differential test got the same lift: 188/188 still match rez 1:1, in 17.73 s (was 20.70 s).

Correctness

  • `cargo build` — clean.
  • `cargo test` — passes (no test changes needed; derived `Hash`/`Eq` semantics preserved).
  • `cargo test --release -p rer-resolver --test test_rez_benchmark -- --ignored` — 188/188 still match rez 1:1.

🤖 Generated with Claude Code

Callgrind on rez's 188-case benchmark (post-#71/#72) showed
`SmallVec::extend` + `Drop` at ~4 % of cycles, almost entirely from
`VersionRange::clone`. Every `Requirement::clone()` (in
`extracted_request.clone()`, the per-pair `package_request.clone()`
in `reduce_by`, the `req.clone()` and `package_request.clone()` in
`Reduction`, etc.) deep-copies the inner `Ranges`'s `SmallVec` of
`(Bound, Bound)` segments. After the rest of the perf stack
(#66/#67/#68/#70/#71/#72), this is the largest non-amortised
allocation cost left.

Switch the inner from `Ranges<RerVersion>` to `Rc<Ranges<RerVersion>>`.
`Rc<T>::clone` is a refcount bump; `Rc<T>::Hash`/`Eq` defer to the
inner `T`, so the derived semantics on `VersionRange` are unchanged.

Methods that build a new range (`intersection`, `union`, `complement`,
`from_versions`, `span`, `split`, ...) still produce a fresh `Ranges`
internally and wrap it with `Rc::new` — the win is on the read /
clone path, not the construction path.

`as_ranges()` still returns `&Ranges` (via `Rc::deref`). `into_ranges`
now uses `Rc::unwrap_or_clone` — falls back to a clone if the `Rc` is
shared, but is the consume-the-`VersionRange` API and rare in
practice.

## Benchmark (188 cases, release, same machine, two runs)

| Stage                              | Total   | Mean   | vs rez |
|------------------------------------|--------:|-------:|-------:|
| Baseline (post-#71/#72), median    | ~12.7 s |  68 ms | ~30×   |
| + this change, run 1               |  11.2 s |  60 ms |  34.1× |
| + this change, run 2               |  11.3 s |  60 ms |  33.7× |

**~11 % on top of #72**, **~74 % cumulative from main** (43.0 s →
11.2 s, 8.8× rez → 34.1× rez).

Differential test (`cargo test … --ignored`): 17.73 s, **188/188 still
match rez 1:1**.

Predicted 3–5 %. The slightly bigger gain reflects that
`VersionRange::clone` cascades into a lot more than just the
`SmallVec::extend` it was attributed to in the callgrind exclusive
view — it also drove allocator-side work and the matching `Drop`s.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@doubleailes doubleailes merged commit 7d75aee into main May 15, 2026
24 checks passed
@doubleailes doubleailes deleted the rc-version-range branch May 15, 2026 15:31
doubleailes added a commit that referenced this pull request May 16, 2026
…hemerals

The Vec-returning accessors (`Solver::resolved_packages` /
`resolved_ephemerals`, `ResolvePhase::solved_variants` /
`solved_ephemerals`) force callers that only need to iterate to pay
for an intermediate `Vec` plus, for ephemerals, an owning clone of
each `Requirement`. Add borrowing-iterator siblings so future
consumers can stream without allocating.

New API, additive — every existing Vec method keeps its signature:

  Solver::resolved_packages_iter()  -> Option<impl Iterator<Item = Rc<PackageVariant>>>
  Solver::resolved_ephemerals_iter() -> Option<impl Iterator<Item = &Requirement>>

  ResolvePhase::iter_solved_variants()   -> impl Iterator<Item = Rc<PackageVariant>>
  ResolvePhase::iter_solved_ephemerals() -> impl Iterator<Item = &Requirement>

`iter_solved_variants` still has to clone `self.scopes` internally
because `get_solved_variant` takes `&mut self` (it triggers a
deferred sort on the variant slice); the saving vs the Vec form is
the trailing `.collect()`. `iter_solved_ephemerals` is the bigger
win — pure borrow, zero allocation, no per-element clone.

Refactor the existing Vec methods to delegate to the iter forms so
there's one implementation of the filter logic.

## pyrer wired through

`crates/rer-python/src/lib.rs` switches its `SolveResult` build to
use `resolved_packages_iter` / `resolved_ephemerals_iter`. Two
intermediate `Vec`s skipped per solved result; for ephemerals every
entry is now read by reference instead of cloned then stringified.

## Tests

- `test_iter_resolved_packages_matches_vec_form` and
  `test_iter_resolved_ephemerals_matches_vec_form` in
  `solver::tests` — confirm iter and Vec forms agree on the same
  input and that the iter form returns `None` on a failed solve.
- Existing 5 Python tests for `resolved_ephemerals` still pass
  (the FFI surface is unchanged; just the implementation under it).

## Verification

- `cargo test` (Rust): 41/41 unit tests pass (was 39 + 2 new).
- `cargo test … --ignored` (188-case differential): 188/188 still
  match rez 1:1 in 17.68 s.
- `pytest tests/` (all Python): 80/80.

## Perf (188-case benchmark, same machine as README reference)

| | Total | Mean | Median |
|---|---:|---:|---:|
| README reference (post-#73)          | 11.35 s | 60 ms | 30 ms |
| This branch, pre iter-forms (run 1)  | 11.19 s | 60 ms | 28 ms |
| This branch, pre iter-forms (run 2)  | 11.27 s | 60 ms | 33 ms |
| This branch, post iter-forms (run 1) | 10.90 s | 58 ms | 28 ms |
| This branch, post iter-forms (run 2) | 11.16 s | 59 ms | 30 ms |

Within run-to-run noise; if anything a slight improvement from the
avoided allocations. No regression.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
doubleailes added a commit that referenced this pull request May 16, 2026
…hemerals

The Vec-returning accessors (`Solver::resolved_packages` /
`resolved_ephemerals`, `ResolvePhase::solved_variants` /
`solved_ephemerals`) force callers that only need to iterate to pay
for an intermediate `Vec` plus, for ephemerals, an owning clone of
each `Requirement`. Add borrowing-iterator siblings so future
consumers can stream without allocating.

New API, additive — every existing Vec method keeps its signature:

  Solver::resolved_packages_iter()  -> Option<impl Iterator<Item = Rc<PackageVariant>>>
  Solver::resolved_ephemerals_iter() -> Option<impl Iterator<Item = &Requirement>>

  ResolvePhase::iter_solved_variants()   -> impl Iterator<Item = Rc<PackageVariant>>
  ResolvePhase::iter_solved_ephemerals() -> impl Iterator<Item = &Requirement>

`iter_solved_variants` still has to clone `self.scopes` internally
because `get_solved_variant` takes `&mut self` (it triggers a
deferred sort on the variant slice); the saving vs the Vec form is
the trailing `.collect()`. `iter_solved_ephemerals` is the bigger
win — pure borrow, zero allocation, no per-element clone.

Refactor the existing Vec methods to delegate to the iter forms so
there's one implementation of the filter logic.

## pyrer wired through

`crates/rer-python/src/lib.rs` switches its `SolveResult` build to
use `resolved_packages_iter` / `resolved_ephemerals_iter`. Two
intermediate `Vec`s skipped per solved result; for ephemerals every
entry is now read by reference instead of cloned then stringified.

## Tests

- `test_iter_resolved_packages_matches_vec_form` and
  `test_iter_resolved_ephemerals_matches_vec_form` in
  `solver::tests` — confirm iter and Vec forms agree on the same
  input and that the iter form returns `None` on a failed solve.
- Existing 5 Python tests for `resolved_ephemerals` still pass
  (the FFI surface is unchanged; just the implementation under it).

## Verification

- `cargo test` (Rust): 41/41 unit tests pass (was 39 + 2 new).
- `cargo test … --ignored` (188-case differential): 188/188 still
  match rez 1:1 in 17.68 s.
- `pytest tests/` (all Python): 80/80.

## Perf (188-case benchmark, same machine as README reference)

| | Total | Mean | Median |
|---|---:|---:|---:|
| README reference (post-#73)          | 11.35 s | 60 ms | 30 ms |
| This branch, pre iter-forms (run 1)  | 11.19 s | 60 ms | 28 ms |
| This branch, pre iter-forms (run 2)  | 11.27 s | 60 ms | 33 ms |
| This branch, post iter-forms (run 1) | 10.90 s | 58 ms | 28 ms |
| This branch, post iter-forms (run 2) | 11.16 s | 59 ms | 30 ms |

Within run-to-run noise; if anything a slight improvement from the
avoided allocations. No regression.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant